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Abstract 

Kingman derived the Ewens sampling formula for random partitions from the genealogy 
model defined by a Poisson process of mutations along lines of descent governed by a simple 
coalescent process. Mohle described the recursion which determines the generalization of the 
Ewens sampliirg formula when the lines of descent are governed by a coalescent with multiple 
collisions. In [7[ authors exploit an analogy with the theory of regenerative composition and 
partition structures, and provide various characterizations of the associated exchangeable ran- 
(— I ^ dom partitions. This paper gives parallel results for the further generalized model with lines of 

. descent following a coalescent with simultaneous multiple collisions. 



1 Introduction 

, Given a large population with many generations, we track backward in time the family history of 

' each individual in the current generation. As we track further, the family lines coalesce with each 

, other, eventually all terminating at a common ancestor of current generation. This model used on 

' biological study for genealogy of haploid type Q is the prototype of Kingman's theory of random 



coalescent processes [2l|, [22, |23|. In Kingman's coalescent [2]J, each collision only involves two 
' parts. The idea is extended to coalescent with multiple collisions in S^, 34 1 where every collision 

, can involve two or more parts. This model is further developed into the theory of coalescent with 

o 



simultaneous multiple collisions in 2^, 37|. See d, 0,11, [13, 35, 38, 40|] for related developments 



Kingman [23| indicated a connection between random partitions and coalescent processes. Sup- 
pose in above haploid model the family line of current generation is modeled by Kingman's coales- 
, cent, and the mutations are applied along the family lines according to a Poisson process with rate 

^ ' 6/2 for some non- negative real number 9. Define a partition by saying that two individuals are in the 

5^ , same block if there is no mutation along their family lines before they coalesce. Then the resulting 

random partition is governed by the Ewens sampling formula with parameter 9. See (29l . Section 5.1, 
Exercise 2] and 0, [28[ for review and more on this idea. Recently, Mohle 24 1 applied this idea to the 



genealogy tree modeled by coalescents with multiple collisions and simultaneous multiple collisions. 
He studied the resulting family of partitions, and derived a recursion which determines them. See 
for more properties of this family. 

Dong, Gnedin and Pitman f?\ offered a different approach to the family of random partitions 
generated by Poisson marking along the lines of descent modeled by a coalescent with multiple 
collisions. In their work, each part of partitions is assigned one of two possible states: active or 
frozen, and a new class of continuous time partition-valued coalescent processes, namely coalescents 
with freeze^ is introduced. Every coalescent with freeze has a terminal state with all blocks frozen, 
called the final partition of this process, whose distribution is characterized by Mohle's recursion 



2J] . In the spirit of [15|, [16| , the authors studied the discrete time chains embedded in the coalescent 
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with freeze, and from the consistency of their transition operators they derived a backward recursion 
satisfied by the decrement matrix, analogous to [lil . Theorem 3.3] . This decrement matrix determines 
the final partition through Mohle's recursion. An integral representation for the decrement matrix 
was derived using algebraic methods as in [l^. Moreover, adapting an idea from [3], the authors 
established a uniqueness result by constructing another Markov chain, with state space the set 
of partitions of a finite set, whose unique stationary distribution is the law of the final partition 
restricted to this set. 

As noted in Q, the current paper serves as a supplement with the theory on more general case 
provided. We focus on the family of partitions generated by Poisson marking along the lines of 
descent modeled by a Markov coalescent with simultaneous multiple collisions. This family was 
characterized by the generalized form of Mohle's recursion [i^l, while in this paper, parallel analysis 
as that in [7] is exploited. 

Notations and background are introduced in Section 2, together with a review of Mohle's idea. In 
Section [HI the generalized coalescent with freeze is defined, followed by the connection between this 
process and the generalized form of Mohle's recursion. Then in Section |4] we study the generalized 
freeze-and-merge (FM) operators of the embedded finite discrete chain of the coalescent with freeze 
process, those consistency with sampling gives a backward recursion for the generalized decrement 
matrix. Also our main result regarding finite partitions is stated. In Section [5] another partition- 
valued Markov chain with generalized sample-and-add (SA) operation is introduced, the law of 
the partition in our study is identified as the unique stationary distribution of this chain. Finally 
in Section 6 we derive the integral representation for the generalized infinite decrement matrix, 
our main result regarding infinite partitions is stated. Section 3-6 of this paper can be seen as 
generalizations of the corresponding sections of 0]- 



2 Notations and background 

Following the notations of fi\ and a partition of a finite set F into £ blocks, also called a finite 
set partition, is an unordered collection of non-empty disjoint sets {Ai, . . . , Ai} whose union is F. 
Partitions of the set [n] := {1, 2, . . . , n} for n G N are of our special interest. Let V[n] be the set of 
all partitions of [n]. For a positive integer n, a composition of n is an ordered sequence of positive 
integers (rii, n2, . . . , ni) with ~ '^i where £ e N is number of parts. Let C„ be the set of all 

compositions of n, and we use Vn to denote the set of non-increasing compositions of n, also called 
partitions of n. 

Take 7r„ = {^i, ^2, • ■ • , as a generic partition of [n], which we write as 7r„ \- [n]. The shape 
function from partitions of the set [n] to partitions of the positive integer n is defined by 

shape(7r„) = IA2I, . . . , (1) 

where \Ai\ represents the size of block Ai which is the number of elements in the block, and "J," 
means arranging the sequence of sizes in non-increasing order. 

A random partition n„ of [n] is a random variable taking values in 'P[„] . It is called exchangeable 
if its distribution is invariant under the action on partitions of [n] by the symmetric group of 
permutations of [n] . Equivalently, the distribution of n„ is then given by the formula 

P(n„ = {Ai,A2, . . . , Ae}) = Pni\Ai\, IA2I, . . . , \Ae\) (2) 

for some symmetric function p„ of compositions of n. pn is called the exchangeable partition proba- 
bility function (EFFF) of n„. 

An exchangeable random partition 0/ N is a sequence of exchangeable set partitions Hoc = 
(n„)5^]^ with n„ h [n], subject to the consistency condition 

Ilnlm — IIm> (3) 

where the restriction operator \m acts on V[n] , n > m, by deleting elements m + l,m + 2, . . . ,n. The 
distribution of such an exchangeable random partition of N is determined by the function p defined 
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on the set of all integer compositions Coo ■— ^iZi^i, which coincides with the EPPF p„ of n„ when 
acting on C„. This function p is called the infinite EPPF associated with XIoo- The consistency 
condition ([3]) translates into the following addition rule for the EPPF p: for each positive integer n 
and each composition (rti, 712, ... , ni) of n, 



p{ni,n2, ■■■,n() =p(ni,n2, . . . 1) + ^^^(711, . . . , + 1, . . . , rt^) (4) 



where (ni, . . . , n.; + 1, . . . , ni) is formed from (ni, . . . , ni) by adding 1 to n^. Conversely, if a non- 
negative function p on compositions satisfies (U) and the normalization condition p{l) — 1, then by 
Kolmogorov's extension theorem there exists an exchangeable random partition Woo with EPPF p. 

Similar definitions apply to a finite sequence of consistent exchangeable random set partitions 
{Jlm)^n=i with Urn l~ [™] , whcrc n is some fixed positive integer. The finite EPPF p of such a 
sequence can be defined as the unique recursive extension of p„ by the addition rule Q to all 
compositions (ni, 712, . . . , ni) of m < n. 

Let Voo be the set of all partitions of N. We identify each tTqc G Vqo as the sequence (tti, 7r2, . . .) G 
^[1] ^ ^[2] X ■ ■ ■ : where 7r„ = tTooIm is the restriction of tTqc to [n] by deleting all elements bigger 
than n. Give Poo the topology it inherits as a subset of V[i\ x V[2\ x • • • with the product of discrete 
topologies, so the space Voo is compact and metrizable. Following [l^, [21], Is^], call a T'oo-valued 
stochastic process (Jlryo{t),t > 0) a coalescent if it has cadlag paths and noo(s) is a refinement of 
I{oo{t) for every s < t. For a non- negative finite measure A on the Borel subsets of [0,1], a A- 
coalescent is a T'oo-valued Markov coalescent {Iloo{t),t > 0) whose restriction (n„(t),t > 0) to [n] 
is for each n a Markov chain such that when n„(t) has b blocks, each /c-tuple of blocks of Iln{t) is 
merging to form a single block at rate Ab,fc, where 
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Xt,k= I x''-^{l-xf-''A{dx) {2 <k<b< 00). (5) 
Jo 

When A = So, this reduces to Kingman's coalescent [21], [l^ll^] with only binary merges. When A 
is the uniform distribution on [0, 1], the coalescent is the Bolthausen-Sznitman coalescent Q. 

A key property of the A-coalescent is that the collision rates do not depend on the internal 
structure of each block, it is therefor natural to seek a more general class of coalescent processes 
which retain this property and undergo "silmutaneous multiple collisions" . This idea is mentioned 
in [33, Section 3.3], and the coalescent process with simultaneous multiple collisions is first obtained 
and characterized in [26*] by a sequence of measures, then in [37] Schweinsberg finds a more compact 
characterization by a single non-negative measure S on the infinite simplex 

00 

A = {{xi,X2, . . .) : xi > 2:2 > • • • > 0, ^ X, < 1}. (6) 

1=1 

See [3^ 2^, 24] for various discussions for this process. 



To be more specific, following notations in [37| for generic set partition with 6 e N blocks, let 
(fci, ^2, . . . , kr, s) be a sequence of positive integers with s > 0, r > 1, ki > 2 for i = 1, 2, . . . , r, and 
s + X]I=i ~ define a (6; fci, k2, . . . , fc^; s)-collision to be a merge of b blocks into r A- s blocks 
in which s blocks remain unchanged and the other r blocks contain fci , fc2 , • . • , of the original 
blocks. The order of fci,fc2,..-,fcr does not matter. For example, take the original partition as 
{{1, 3}, {2}, {4}, {5}, {6, 7}, {8}}, then partition {{1, 2, 3, 5}, {4, 6, 7}, {8}} is a (6; 3, 2; l)-collision of 
the original partition and also a (6; 2, 3; l)-collision of the original one. It is clear that for any generic 
set partition with b blocks, the number of possible (6; fci, ^2, . . . , /c^; s)-collisions is 

where Ij := #{i : ki = j}. 

Let S be some non- negative finite measure on the infinite simplex with the form S = Sq + aSo, 
where Sq has no atom at zero and So is a unit mass at zero. A 'E.- coalescent starting from generic 
infinite partition tToo S Voo is a 'Poo-valued coalescent (noo(i),t > 0) with 
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noo(O) = TToo, 

for each positive integer n, the restricted process (n„(i),t > 0) :— (noo(t),< > 0)|„ is a 
■Pjnj-valued Markov chain such that when Unit) has b block, each possible (6; fci, k2, . . . , fc^; s)- 
collision happens with rate Xb;ki,k2,---,kr:s: which is defined as the integral 

oo 

i=i 

(8) 

It is called a standard S-coalescent if the starting state tTqc — "^oo being the partition of N into 
singletons. 

The measure S which characterizes the coalescent is derived from the consistency requirement, 
that is for any positive integers < m < n < cxd, and 7r„ h [n], the restricted process (n„(t)|m, t > 0) 
given n„(0) = 7r„ has the same law as {Ilm{t), t > 0) given llmi.0) = T^n\m- This condition is fulfilled 
if and only if the array of rates {\n;ktM,---,kr;s) satisfies 

r 

^n;kiM,---,kr;s ~ \i;ki,---,ki-i,ki + l,ki^i,...,kr;s + sA„+i;fcj ^fe2^..._A,v,2:s-l + ^n+l;ki,k2,---,kr\s+l (9) 

i=l 

where we say sXn+i;kx,k2----,kr,'i:s-i ~ when s — although it is undefined, so that the right 
hand side makes sense. The integral representation ^ can be derived from ^ and exchangeability 
arguments 1371 . 

Mohle [ijl studied the following generalization of Kingman's model [5^. Take a genetic sample 
of n individuals from a large population and label them as {1, 2, . . . , n}. Suppose the ancestral lines 
of these n individuals evolve by the rules of a A-coalescent (S-coalescent in general), and that given 
the genealogical tree, whose branches are the ancestral lines of these individuals, mutations occur 
along the ancestral lines according to a Poisson point process with rate p > 0. The infinite-many- 
alleles model is assumed, which means that when a gene mutates, a brand new type appears. Define 
a random partition of [n] by declaring individuals i and j to be in the same block if and only if they 
are of the same type, that is either i = j or there are no mutations along the ancestral lines of i and 
j before these lines coalesce. These random partitions are exchangeable, and consistent as n varies. 
The EPPF of this random partition is the unique solution p with p{l) = 1 of Mdhle's recursion. In 
this paper, we focus on the general case with the ancestral lines modeled by a S-coalescent. 



In order to write out Mohle's recursion [2J, Theorem 5.1] of the general case in a form which fits 
our treatment better, we introduce some notations: given a generic composition (ni,n2, . . . ,ne) of 
positive integer n and a positive integer set {fci, . . . , kr} with r > 1, fci, . . . , fc^ > 2, J2^j=i % — "-j 
each Hi choose some kj's from {fci, . . . , k^}. Denote the index set of those fc^'s chosen by Ui as 

rji := {j : kj is chosen by rii}, 

and it is when Ui chooses nothing. The choices must satisfy 

• every kj can only be chosen by one n^; 

• for each i, Ui > J2jerii 

So every such choice can be represented by a sequence of index sets 

V= iVi,---,Ve), (10) 
We write elements in rji as ?7i(i), fyi(2), • ■ and use the notation 

Ht:i::S:l-w,rf,...} ai) 

to denote the set of all possible choices, especially we identify two choices rj^ and r/^ if for each i, 
{kj '■ j G ij}} = ■ j G vf}- This eliminates some trivial repetition caused by the indexing of 
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{fci, . . . , kr}. For example, in our definition H^^'^^-^ contains only one choice ({1, 2}, {3}), which is 
considered as the same as ({2, 3}, {1}) or ({1, 3}, {2}). 
With these notations, we write Mohle's recursion as 



p{ni,n2,...,ni) = '■ — ^ Pi^j) + X! q{n : ki, . . . ,kr;n -^k^] 

j:nj=l {fei,...,fc,,} j = l 



....... din-M,...,k.;n-EU'^,) ^^"^ ^ g + |.r |, • ■ • , n. - g 



(12) 



where (rij) is formed from (ni,n2, . . . ,nf) by deleting part rij, \rii\ is the number of elements in rji 
and the second sum on the right hand side is over all integer sets {fci, . . . , fc^} with r > 1, fci, . . . , fc^ > 
2,EI=ifc» < n. Also in 

q(b;ki,k2,. . . ,kr;s) := ^^^^ , (13) 

.(^:l):=^- (14) 



where 



$(&:1) pfo (15) 

^{b;ki,k2,...,kr;s) := d{b; ki, k2, ■ ■ ■ , kr; s)Xb.ki,k2.---,kr;s (16) 

$(&) $(6:1)+ ^ $(6;fci,fe,...,fc.;s). (17) 

{fci,...,fc,} 

in which the last sum is over all multisets {fci, . . . , kr} with r > 1, fci, . . . ,kr > 2, X]i=i ^ 
and Xb;ki,k2,---,kr-;s^s are defined as the integral ([5]). The meaning of these notations are obvious: 
suppose at some time t > 0, a, finite S-coalescent freezing at rate p has b active blocks, then <i>(& : 1) 
is the total rate at which some active block freezes; ^{b; ki, k2, ■ ■ ■ , kr; s) is the total rate of a 
{b; ki, k2, ■ ■ ■ , kr; s)-collision, and $(6) is the total rate of changing state. If we look at its embedded 
discrete jump chain, q{b : 1), q{b; fci, ^2, ... , kr; s) are the transition probabilities of these two kinds 
of events, respectively. 

Mohle \M derived the recursion (|12p by conditioning on whether the first event met tracing 
back in time from the current generation is a mutation or collision. On the left side of (|12p . 
p(ni, n2, . . . , ni) is the probability of ending up with any particular partition 7r„ of the set [n] into 
£ blocks of sizes (ni,rt2, . . . ,ni). On the right side, q(n : 1) is the chance that starting from the 
current generation, one of the n genes mutates before any collision; for this to happen together 
with the specified partition of [n], the individual with this gene must be chosen from those among 
the singletons of 7r„, with chance 1/n for each different choice, and after that the restriction of 
the coalescent process to a subset of [n] of size n — 1 must end up generating the restriction of 
7r„ to that set. Similarly, q{n : fci, . . . , fcj.; n — Sj=i chance that the first event met is a 

{n : fci, . . . ,kr;n — X]j=i fcj)-collision. By the definition, each element ry in ^j^^ represents a 

class of ways of grouping singleton blocks to perform a (n : fci, . . . , fc^; n — X]j=i fcj)-collision such 
that it is possible for the resulting partition to have block sizes {rti, n2, ■ ■ ■ ,ni}. Given each 77, there 
are still various possibilities due to the different grouping scheme inside each block with sizes n^, 
this is where the factor 

ni=i d{ni; krj.^^^ , ■ ■ ■ , kri^^.^^ ; rii — X^lli ^i(i) ) 
d{n;ki, ...,kr;n- ^2"]=! ^]) 
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in (|12p comes from. Conditioning on a particular selection, the restriction of the coalesccnt process 
to some set of n — X]r=i ^i + r lines of descent must end up generating a particular partition of these 
n — ^l^ih +r lines into sets of sizes 

\Vl\ 

(ni + \T]i\,---,nt~J2'^m(i) + \ve\)- (19) 

1=1 1=1 

The multiplication of various probabilities is justified by the strong Markov property of the S- 
coalescent at the time of the first event, and by the special symmetry property that lines of descent 
representing blocks of individuals coalesce according to the same dynamics as if they were singletons. 

Same as , in this paper we choose to step back from the special forms (fT3)) , (jH)) of the q entries 
q{b; ki, /c2, • . . , fc^; s)'s and q{n : 1) derived from the (S, p), and analyse Mohle's recursion (|12p as an 
abstract relation between an array of q entries and a function of compositions p. In particular, we 
ask the following questions, quoted from 

1. For which array of q entries is Mohle's recursion satisfied by the EPPF p of some ex- 
changeable random partition of [n] , and is this p uniquely determined? 

2. How can such random partitions be characterized probabilistically? 

3. Can such random partitions of [n] be consistent as n varies for any other array of q entries 
besides those derived from (S, p) as above? 

We stress that in the first two questions the recursion l|12p is only required to hold for a single 
value of n, while in the third question must hold for all n — 1,2,.... The answer to the first 
question is that for each fixed array of q entries with sum equaling to 1, which will be made precise 
later in Section [U Mohle's recursion (|12|) determines a unique EPPF p for an exchangeable random 
partition of [n] (TheoremlH). Answering the second question, we characterize the distribution of this 
random partition in two different ways: firstly as the terminal state of a discrete-time Markovian 
coalescent process, the generalized freeze- and-merge chain introduced in Section 31 and secondly 
as the stationary distribution of a partition-valued Markov chain with quite a different transition 
mechanism, the generalized sample- and- add chain introduced in Section [S] The answer to the third 
question is positive if we restrict n to some bounded range of values, for some but not all q (see 
Section HI), but negative if we require consistency for all n (Theorem fTS]): if an infinite EPPF p 
solves Mohle's recursion for all n for some non-negative q entries, then q entries must have the 
form p^ . p^ for some (S, p). All these results are generalized version of the corresponding ones 
in 0. 

The analysis in this paper follows the same route as |7|, where the authors were guided by a 
remarkable parallel between the theory of finite and infinite partitions subject to Mohle's recur- 
sion and the theory of regenerative partitions developed in [isl . [l6| . Many of these parallels are 
summarized in [2, Section 9] . Readers can check [7] for other aspects of this idea. 

3 Coalescents with freeze 

Same as 01, we consider the structure of a partition of a set (respectively, of an integer) with each 
of its blocks (or parts) assigned one of two possible conditions, which we call active and frozen. 
We name such a combinatorial object a partially frozen partition of a set or of an integer. Use 
symbol E* for the pure singleton partition of [n] with all blocks active, and for the sequence 
(S*)5^]^. We include the possibilty of all blocks being active or frozen as special cases of partially 
frozen partitions. Ignoring the conditions of the blocks of a partially frozen partition tt* induces 
an ordinary partition tt. The *-shape of a partially frozen partition tt* of [n] is the corresponding 
partially frozen partition of n, and the ordinary shape is defined in terms of the induced partition 

For each positive integer n, we denote V*^-. the set of all partially frozen partitions of [n]. Let 
be the set of all partially frozen partitions of N. We identify each element tt^ e as the sequence 
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(tt*, 7r2, . . .) e V*^^ X V^^] ^ ■ ■ ■ J where tt* is 7r^|„ the restriction of vr^ to [n]. Endowing with 
the topology it inherits as a subset of V^^ x V^2] ^ ' ' ' j the space is compact and metrizable. 
We call a random partially frozen partition of [n] exchangeable if its distribution is invariant under 
the action of permutations of [n]. Following call a T'^-valued stochastic process (n^(t),i > 0) 
a coalescent if it has cadlag paths and ni^(s) is a *-refinement of 11^ (i) for every s < t, meaning 
that the induced partition noo(s) is a refinement of Iloo{t) and the set of frozen blocks of n|^(s) is 
a subset of the set of frozen blocks of (t) . 

The remaining part of the section gives formal statement about the "E.- coalescent with freeze^ 
which is the generalization of A-coalescent with freeze defined in l7|. The connection between 
Mohle's model [l^l and our realization has been outlined clearly in [^Section 3]. 

Theorem 1. Let {\i,.kuk2,...,K;s : 2 < & < oo, r > 1, /ci, . . . , fc^ > 2, s > 0, 6 = s + YJi=ih], 
{pn,b, 1 < b < n < oo) be two arrays of non-negative real numbers. There exists for each tt^ S a 
V^-valued coalescent (Jl*^{t),t > 0) with 11^(0) = tt^, for each n whose restriction (n*(t),i > 0) 
to and evolving with the rules: 

• at each time t>Q, conditionally given H* (t) with b active blocks, each possible {b;ki,k2, . . . ,kr]s)- 
collision is occurring with rate Xb-ki,k2,...,k,.-s, and 

• each active block turns into a frozen block at rate Pn.b, 

if and only if the integral representation ^ holds for some non-negative finite measure on the infinite 
simplex with the form S = Sp + ai5o, where Sq has no atom at zero and 6q is a unit mass at zero; 
and Pn.b ~ p for some non-negative real number p. This V^-valued process {Il^{t),t > 0) directed 
by (S, p) is a strong Markov process. 

For p = 0, this process reduces to the 'E- coalescent, and for p > the process is obtained by 
superposing Poisson marks at rate p on the merger-history tree of a 'E.- coalescent, and freezing the 
block containing i at the time of the first mark along the line of descent of i in the merger-history 
tree. 

Proof. Consistency of the rate descriptions for different n implies ([9|) , from which we have existence 



of measure S and the integral representation ([8]) by 37|, Lemma 18] and 37|, Theorem 2]. Equality 



of the Pn.bS is obvious by consistency. □ 

Definition 2. Call this T'^-valued Markov process directed by a non-negative integer p and 
a non-negative finite measure S on the infinite simplex the E- coalescent freezing at rate p, or the 
(E,, p)- coalescent for short. Call a (S, p)-coalescent starting from state a standard E-coalescent 
freezing at rate p, where is the pure singleton partition with all blocks active. 

Consider the finite coalescent with freeze (n*(t),t > 0) which is the restriction of a standard 
S-coalescent freezing at rate p to [n]. It is clear that as long as the freezing rate p is positive, in 
finite time the process (11* (t), < > 0) will eventually reach a final partition E*, with all of its blocks 
in the frozen condition. Set :— as the final partition of (\I^{t),t > 0), and denote its 

induced partition as E^o — (En). If we look at the discrete chain embedded in finite 2-coalescent 
freezing at rate p, by conditioning on the first transition we can see the following facts: 



Theorem 3. ( Mohle [24|, Theorem 5.1]) The induced final partition E^o = {En)^=i of a standard 
E- coalescent freezing at rate p > is an exchangeable infinite random partition ofN whose EPPF p 
is the unigue solution of Mohle's recursion (|12p with q coefficients defined through {E, p) as in ()13p . 

(HI. 



4 Freeze-and-merge operations. 

Following [7, Section 4] , given a continuous time stochastic process X with right continuous piecewise 
constant path, the jumping process derived from X is the discrete-time process 

X = (X(0),X(1),...) = (X(ro),X(Ti),X(T2),...) 
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where Tq and Tk for fc > 1 is the least t > Tk-i such that X{t) ^ X{Tk-i), if there is such a i, 
and Tk = Tk-i otherwise. In particular, the finite coalescent with freeze (11* (i), i > 0), obtained by 
restriction to [n] of a S-coalescent freezing at positive rate p, is a Markov chain with transition rate 
d{b] fci, fc2, ■ • ■ , kr] s)Xb-ki,k2,...,kr;s for a (6; fci, fc2, . . . , /cr; s)-collision and rate bp for a freeze, where b 
is the number of active blocks at time t and the d{b; ki, k2, ■ ■ ■ , kr; s)^s and A^.^'s are as in ([T]) and (l8|); 
while the jumping process 11* is then a Markov chain governed by the following freeze- and-merge 
operation FM„, which acts on a generic partially frozen partition vr* of [n] as follows: if tt* has 5 > 1 
active blocks then 

• with probability q{b : ki, k2, . . . , kr', s) a selection of active blocks to perform a (6; fci, ^2, ■ • • , ^r; s)- 
collision is chosen uniformly at random from d{b;ki,k2, ■ ■ ■ ,kr; s) total number of possible 
choices and the (5; fci, /c2, ■ ■ • , fcr; s)-collision is then performed as the chosen way; 

• with probability q(b : 1) an active block is chosen uniformly at random from b blocks and 
turned into a frozen block. 

where q{b : ki, fc2, • ■ • , kr] s)'s and q{b : 1) are of the forms (fT3|) . (fT4|) . When b — 1, only the second 
option is available. As a fact from last section, the continuous time processes H* (t) are Markovian 
and consistent as n varies, meaning that Il'^lt) for m < n coincides with H* the restriction of 

nut) to H- .... 

To view Mohle's recursion (|12p in greater generality, we consider this freeze-and-merge operation 
FM„ for n some fixed positive integer, and an array 

q'^'^'' ■.= {q{l:-), q{2 :■),..., q{n- I:-), q{n:-)) (20) 

where 

g(l:.) = = (21) 

and for 1 < 6 < n, 

r 

q{b : ■) := {q{b : 1), q{b : /ci, fc2, . . . , fc^; s) : r > 1, fc^ > 2 for i = 1, 2, . . . , r, and s = b — ki > 0} 

(22) 

with all entries added up to 1, where the order of index entries ki is neglected. And we always assume 
q{b : •) include all q{b : fci, k2, ■ ■ ■ , kr] s)'s for all possible indexes {ki, k2, ■ ■ ■ , kr} by supplementing 
entries. 

Let (n* (fc), A: = 0, 1, 2, . . .) be the Markov chain obtained by iterating FM„ starting from 11^(0) = 
S* . The array g*^"^ can be see seen as generalization of decrement matrix qn in 0, Section 4]. For 
completeness, we include the propositions and lemmas listed in 0, Section 4], most of them do not 
rely on the merging mechanism of freeze-and-merge operations. In particular. Lemma [7] is the key 
result for this generalized case, and provides the basis for Theorem [HI our main result regarding 
finite partitions. 

Observe that for m = 1, . . . , n the first m entries 

{q{l ■.■),qi2 :■),..., qin^l:-),qim:-)) (23) 

of array g'^"^ comprise another array which itself defines a freeze-and-merge operation FM^ on 
partially frozen partitions of [to] . 

Proposition 4. Given an array as in (gOl) and (11* (A;), fc = 0, 1, 2, . . .) as a Markov chain 
governed by FM„ starting from E* 

(i) The (n* (fc)) chain is strictly transient, it finally reaches a partially frozen partition E* of [n] 
with all blocks frozen. Same thing holds for Markov chains governed by FM^, 1 < m < n derived 
from q^"'\ note their final state as 's, respectively. Let Em be the induced partition of [m] from 
E^ for 1 < TO < n. 

(ii) Define p as the function on U^^iCm whose restriction to Cm is the EPPF of Em- Then p 
satisfies Mohle's recursion (fT2)) for each composition (ni, n2, . . . , ni) € C„. 
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Part (ii) in the proposition follows by conditioning on the first transition of the (11* (fc)) chain, 
similar with 0, Lemma 4.1]. 

In the general settings of Proposition [H the sequence of exchangeable final partitions (£^m)m=i 
need not be consistent with respect to restrictions. The question of what constraints on g^"^ can 
guarantee the consistency of {Em)m=i guided our reasoning. 

Definition 5. For an array g^"-' as in ([^0]) and 1 < m < n, call the transition operators FM„ 
and FMm derived from g*^"^ consistent if whenever 11* is a Markov chain governed by FM„, the 
jump process derived from the restriction of 11* to [m] is a Markov chain governed by FMm . Call 
the decrement matrix Qn consistent if this condition holds for every 1 < m < n. 

It is clear from consistency of the continuous time S-coalescent with freeze (11* (t), t > 0) 
introduced in the last section that for every n the corresponding array g^"'' with forms (jl3p (|14p 
is consistent. Let FM„(7r*) denote the random partition obtained by action of FM„ on an initial 
partially frozen partition tt* of [n], 

Lemma 6. Given a particular array q^") as in ([20|) ; 

(i) For fixed 1 < m < n the transition operators FM™ and FM„ are consistent if and only if for 
each partially frozen partition tt* of [n] , there is the equality in distribution 

FM„«U) =FM„«)|U (24) 

where on the left side 7r*|,„ is the restriction of to [m], and on the right side the notation ||m 
means the restriction to [m] conditional on the event FM„(7r* |m) ^ tt* |m that FM„ freezes or merges 
at least one of the blocks of tt* containing some element of [to] . 

(ii) //FM„i_i and FM„i are consistent for every I < m < n, then so are FM,„ and FM„ for 
every 1 < m < n; that is, g*-"-* is consistent. 

Following is the consistency results for array g^"^ , which is the generalized version of 0, Lemma 
4.4]. The omitted proof uses quite the same idea as that of 0, Lemma 4.4], by looking at FM;,_|_i 
and FMft applied to S^+i ^fcj respectively, and utilizing equation (|24p . Lemma [6] then links us 
between relation (|24p and consistency of g*^"). 

Lemma 7. An array g*-"-* with form ()20p is consistent if and only if it satisfies the backward 
recursion: 

q{b : /ci, . . . s) =^ (fc, "'"^^^^^^•+^ ^\ {b + 1 : fci, . . . , h^i, h + 1, h+i, ...,kr;s) 

+ ^^J^^g(fo + 1 : . . . , fc„ 2; s - 1) + ^^g(& + 1 : fci, . . . , fc,; s + 1) 

+ r^'?(?' + 1 : l)g(& : fci, . . . , K] s) + -r^q{b + 1 : 2; 6 - l)g(5 : fci, . . . , fc,; s) 

0+1 0+1 

(2 < 6 < n), (25) 



qib : 1) = ^lib +1:1) + ^^(^ + 1 ■ ■ 1) + ^'?(& + 1 : 2; 6 - l)g(6 : 1) (1 < & < n), 

(26) 

where Ij :— ^{i : ki — j}, and when s ^ 0, we say q{b + 1 : fci, . . . , kr, 2; s — 1) = even though it is 
undefined, so that the right hand side of (I25p makes sense. 

Consequently, each array q(n : •) with form p2p determines a unique consistent q^^\ 

With Proposition m we have 
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Lemma 8. For 1 < m < n, let E„i be the final partition of the FMn- chain starting in state S*^ 
as defined in Proposition [4j // the array g*-"^ is consistent then the finite sequence of exchangeable 
random set partitions (i?m)m=i ^■^ consistent in the sense that 

-^m I m ■ 

The finite EPPF p of (-E'm)m=i then satisfies Mohle's recursion (I12|) for all compositions of m < n 
in the left hand side. 

Here is our principal result regarding finite partitions satisfying (|12p . which is parallel with 0, 
Theorem 4.6]. 

Theorem 9. For a positive integer n > 1 and arbitrary array q{n : •) with form (j22p 

(i) there exists a unique finite EPPF p for a consistent sequence of random set partitions {HmYm^i 
which satisfies Mohle's recursion (jl2p for all compositions of n, 

(ii) this finite EPPF p satisfies Mdhle 's recursion \12\ for all compositions of positive integers 
m < n with coefficient arrays q{m : •) derived from q{n : •) by the recursion (|25|) . ()26|) . 

(iii) for each 1 < m < n the distribution of !!,„ determined by the restriction of this EPPF p to 
compositions of m is that of the final partition of the FM„i Markov chain with array q'™) 
defined by (ii), starting from state I]*^. 

Proof. Given g(n : •) , we can define a consistent array g'"' by the backward recursion (|25p . (|26p . 
Then use g*-"^ to build a sequence of Markov chains: for each m, the chain fc = 0, 1, 2, . . .) 

starts from and evolves according to FM„j- By Lemma[51 the sequence of induced final partitions 
{E„i)"i^i of these chains has EPPF p which satisfies recursion Hence the existence part of (i) 
follows. The uniqueness in part (i) can be read from results in the next section. The assertions (ii) 
and (iii) follow directly from this construction. □ 



5 The sample-and-add operation. 

Following 0, Section 5], we give Mohle's recursion (|12p another interpretation as the system of 
equations for the invariant probability measure of a particular Markov transition mechanism on 
partitions of [n], generalized sample-and-add operations, which takes the operations in 0, Section 
5] as special forms. As a consequence, the uniqueness in Theorem IH] follows from the uniqueness of 
this invariant probability distribution. 

Fix some positive integer n and a sequence 

r 

q(n : •) := {q{n : 1), q{n : fci, k2, . . . , fc^; s) : r > l,ki > 2 for i = 1, 2, . . . , r, and s = n — ki > 0} 

i=l 

(27) 

with all entries added up to 1, where the order of index entries ki is neglected. And we always assume 
q{n : ■) include all q{n : ki, ^2, . ■ . ,kr] s)'s for all possible indexes {fci, k2, . . . , fc^} by supplementing 
entries. Let A'„ be a random element with its distribution according to this sequence q{n : •), 
i.e. Kn equals to 1 with probability q{n : 1), and equals to set {ki,...,kr} with probability 
q{n : fci, fc2, . . . , fc^; s). 

Consider the following sample-and-add random operation on V^n], denoted as SA„. We regard 
a generic random partition H„ h [n] as a random allocation of balls labeled 1, . . . , n to some set of 
nonempty boxes, which the operation SA„ transforms into some other random allocation H^. Given 
n,i = 7r„ , 

• if Kn = 1, first delete a single ball picked uniformly at random from the balls allocated 
according to 7r„, to make an intermediate partition of some set of n — 1 balls, then add to this 
intermediate partition a single box containing the deleted ball. 
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• if Kn ~ {fci, . . . , fcr}, first pick out a sequence of fci — 1 of the n balls from 7r„ by uniform 
random sampling without replacement and put these fci — 1 balls together as set #1; continue 
to pick out a sequence of fc2 — 1 of the remaining — fci + 1 balls by uniform random sampling 
withour replacement and put these ^2 — 1 balls together as set #2; keep doing these until we 
get r set of balls marked as #1, #2, . . . , #r. 

Then mark a ball chosen uniformly from the remaining n — X]i=i + balls as ball #1, 
continue to mark a ball chosen uniformly from remaining n — X]i=i ki -\- r — \ unmarked balls 
as ball #2, keep doing these until we marked r balls. Now for each i = 1, 2, . . . , r, add all balls 
in set #i into the box containing ball #i. 

In either case delete empty boxes in case any appear after the sampling step. The resulting partition 
of [n] is n^. For each q{n : •), this defines a Markovian transition operator SA„ on partitions of [n]. 

Lemma 10. Let n„ be an exchangeable random partition of [n] with finite EPPF p defined as 
a function of compositions of m for 1 < m < n. Let be derived from Tin by the SA„ operation 
determined by some sequence q{n : •) as (|27p. Then n'„ is an exchangeable random partition of [n] 
whose EPPF p' is determined on compositions of [n] by the formula 

p'{ni,n2, . . . ^ ' '^^ ^ p{nj) + ^ g(n : fci, . . . , A:,.; n - ^ 

3--nj = l {fci,...,tv} 3 = 1 

d{n;ki,...,kr;n-}2.=ikj) ^ m ^ u 



1=1 

'{fci,...,fc^} 



(28) 



where (nj) is formed from (ni, n2, . . . , ng) by deleting part Uj, \7]i\ is the number of elements in rji 
and the second sum on the right hand side is over all integer sets {ki, . . . , k^} with r > 1, ki, . . . , k^ > 
2. El=i < 

(Note that the right side of ([28|l is identical to the right side of Mohle's recursion (fT2|) .) 
Proof. Let Kn be a random element with its distribution according to the sequence q{n : •), i.e. 
Kn equals to 1 with probability q{n : 1), and equals to set {ki,...,kr} with probability q{n : 
ki,k2T ■ ■ ■ ,kr]s). For each partition tt^ of [n] we can compute 

P(n; = <) = q{n : 1) P(n; = < | Kn = 1)+ 

r 

J2 q{n:ki,...,kr;n-Y,k3)mn=<\Kn = {ki,...,kr}). (29) 

{fci,...,tv} 3 = 1 

Assuming that tt^ has boxes of sizes ni, . . . ,ni, and that the SA„ operation acts on an exchangeable 
n„ with EPPF p, we deduce ^ from ^ and 

P(n; = <|i^„ = i) = i J2 (30) 

3-nj=l 



¥iU'n^n'n\Kn^{kl,...,kr})^ 

— — , y^r , P("i - 2^ + |r;i I, . . . , - fc,,,, - 

''^-"{fci,.^.,fc^} 

(31) 

First consider PT|) . By the definition, every 77 in -ffj^^^' induces a class of ways to allocate sets of 
sampled balls into boxes of tt^ with which the result (H^ ='^'n) is possible after the SA„ operation. 
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With the particular class of allocation indicated by 77, the actual sequence of labels of sampled and 
marked balls, in order of choice, can be any one of 



n 

sequences, out of 



1 J2i=i ^i(i) )■ 



(n-E;=ifc.)! 

total number of possibilities. 

Given any particular order of sampling and marking balls, let A/^^^^ be the set of labels 
of the " balls that are moved. Then the event (IIJj = Tr'J occurs if and only if the 

restriction of n„ to [n] — M^':_^ j, equals the restriction of tt^ to [n] — -^xi'^-i ^j-'"' "^^i^h is 
a particular partition of n — X]^=i % ~^ ^ labeled balls into boxes of fii, . . . ,n£ balls, where hi — 

ndi\Vi\ = 0) + {rii - J2\=i + ^ 0)- The conditional probability of (11^^ = tt^), given 

Kn equalling to some {ki, . . . , /c^} and a particular order of sampling and marking balls with which 
the SA„ operation is performed, is therefore 

Pini -^k^i(i) + \Vi\,---,ne-J2^m(i) + M) 
1=1 1=1 

by the assumed exchangeability of n„ , and the definition of the EPPF p of n„ on compositions of 
m < n hy restriction of n„ to subsets of size m. 

Also by the definition of H^^'^'"''2'y, some choices ry's are counted once although they appear 
different here because of the labeling of elements in set {ki, . . . , kr}, e.g. ({1, 2}, {3}), ({2, 3}, {!}) 
and ({1, 3}, {2}) are counted as one thing in H^^'^^y, but here since the balls are labeled according to 

the order they are sampled and marked, the repetition we eliminated in defining h'^^^^""'^'}^ actually 
leads to different ways of sampling and adding, so we want to count these in by multiplying 

j=2 \ ''1 J ' '»)2 J ' • ■ • I 'Vi-J 

where Ij = #{i : fcj = j}, ^ #{z : fc,,,^^(j) = j} for j = 2, . . . , n and m = !,...,£. 

Now the evaluation (|3ip is apparent, and (|30p too is apparent by a similar but easier argument. 

□ 

The following proposition follows from above lemma and similar argument as in Q. 

Proposition 11. For each sequence q{n : ■) as ^27^ , the corresponding SA„ transition operator 
on partitions of [n] has a unique stationary distribution. A random partition with this stationary dis- 
tribution is exchangeable, and its EPPF is the finite unique EPPF p that satisfies Mdhle 's recursion 
(fT2l) . that is dMl) with p' = p. 



6 Infinite partitions 

Same as 0, Section 6], in this section we pass from finite partitions to the projective limit, and 
arrive at the desired integral representation of infinite array q°^ satisfying recursion (I25p . ([26|) . We 
get the main result which is the infinite counterpart of Theorem [9l and can be seen as generalized 
version of {7:, Theorem 6.2]. 

An infinite sequence of freeze-and-merge operations FM (FM„,n = 1,2, . . .) which satisfies 
the condition in Definition [5] for all positive integers l<m<n<oois called consistent. For each 
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n = 1,2,... the Markov chain starting from S* and driven by FM„ terminates with an induced final 
partition n„. These comprise an infinite partition IIoo ~ 0^n)^=i which we call the final partition 
associated with consistent infinite FM. 

Lemma 12. For every infinite array 

■.= iqil:-), q{2 :■),..., q{n :■),...) (32) 

where 

g(l:.) = Wl:l)}-{1} (33) 

and for b > 1, q{b : •) are as in (|22[) . with entries satisfying the recursion (j25[) . (j26p . there exist a 
non-negative finite measure on the infinite simplex with the form S = Sq + aSo, where Sq has no 
atom at zero and 6o is a unit mass at zero, and a non-negative real number p such that the entries 
of q°° can be represented by (2,p) as p3|) . (I14p . The data (S,p) are unique up to a positive factor. 

Proof. Suppose q solves the recursion ([25]) . and suppose q{2 : 2; 0) < 1. Let <i>(n), n = 1, 2, . . . 

satisfy 

WTT) - ' - ^'^(^ ;rTT'^(" + 1 : 2; n ~ 1) (34) 

for n > 1; since the right hand side is strictly positive this recursion has a unique solution with some 
given initial value $(1) — p, where p > 0. For each q{n; fci, fc2, . . . , kr', s) set 

q{n : ki, k2, . . . , K; s) 
a[n; ki, k2, . ■ . , kr; s) 

then from (|M)) and (pS)) . we can derive 

r 

^n;ki,k2,...,kr;s = ^n;ki ,. . .Mi^i ,ki+lMi-f-i . . ,kr:s + sXn+l-ki M ,■ ■ ■.I'r .'i-.s-l + Xn+l:kiM.---,kr-s+l 

then by fS?', Lemma 18] and 37, Theorem 2] we conclude ([5]), hence (IT51) holds for some non-negative 
finite measure on the infinite simplex with the form S = Sg + a6o, where Sg has no atom at zero 
and So is a unit mass at zero. 
From we find 

^{l)q(l : 1) <^>(n)q(n : 1) 
= ... = = ••• , 

1 n 

hence by setting $(n : 1) := pn we deduce p^ . For the special case q{2 : 2;0) = 1, it is easy to 
observe that p = 0, and we get S = (5o by similar analysis. □ 



Recording this lemma together with previous results, we have the following result, which is the 
counterpart of [7, Theorem 6.2]: 

Theorem 13. Let Hoc — (n„)^]^ be a nontrivial exchangeable random partition ofN, different 
from the trivial one-block partition. The following are equivalent: 

(i) The EPPF p satisfies Mdhle 's recursion (I12p with some infinite array q°° with form p2p . 

(ii) This array is representable as (fT3|) . (|14p by some nontrivial (S, p) which is unique up to a 
positive factor, as claimed in Lemma \V2\ 

(ii) This Hoc is induced by the final partition of some standard 'E.-coalescent freezing at rate p. 

(iii) This lioo is the final partition of some consistent infinite FM operation. 
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Finally, we complete this paper with the following uniqueness assertion, similar with 0, Lemma 
6.3]: 

Lemma 14. The correspondence q°° i— > p between infinite arrays with q{2 : 1) > satisfying 
consistency (j25p . (I26p and the EPPF's is bijective. 

Proof. We only need to show that p uniquely determines q. For general infinite partitions, q{2 : 1) = 
1) > implies that p{l, 1, . . . , 1) > 0. By Lemma[8l p must solve Mohle's recursion (fT2)) . so 

p(l,...,l)=(7(n:l)q(n-l:l)...<7(2:l) 

shows that the q(n : l)'s are uniquely determined by p. By exploiting the formula 

q(n : m:n ~ m) , , , ,n — m , ^ 

p{m, 1, . . . , 1) '-p{l, ...,!)+ q{n : l)__p(m, 1, 1, . . . , 1) 

\m} 

ra—1 /m\ 

+ Y,q{n:k;n- k)^p{m - fc + 1, 1, . . . , 1) (35) 

k=2 Vfej 

with induction in m = 2, 3, . . . , n — 1, it is clear that entries with form q{n : m; n~m), 2 < m < n — 1 
are also uniquely determined by p. Similarly, if we look at the equation ()12p with p{m, 1,1, ... , 1), 
2 < m, l<n, m + l<n on the left hand side, by induction in m, I, we can deduce that entries with 
form q{n : m,l;n ^ m ^ I) are uniquely determined by p as well. Same mechanism can be carried 
on to conclude that all entries of q°° are uniquely determined by p. 

The uniqueness fails when q{2 : 1) = 0, which corresponds to the case with no freezing, in that 
case the singleton partition will be the final partition regardless of the coalescing theme. □ 
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